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Abstract. We study the universality of the eigenvalue statistics of the covariance matrices 
^M*M where M is a large px n matrix obeying condition CI. In particular, as an application, 
we prove a variant of universality results regarding the smallest singular value of M V}Tl . This 
paper is an extension of the results in |10| from the bulk of the spectrum up to the edge. 



1. Introduction 

The goal of this paper is to extend the Four Moment theorem established by Tao and Vu [10] 
for iid covariance matrices from the bulk of spectrum to the edge. Let us first specify the matrix 
ensembles that will be studied. 

Definition 1.1 (Condition CI). Consider a random px n matrix M„ iP = (dj)i<i<p,i<j<n, where 
p = p(n) is an integer parameter such that p < n and lim„_ i . 00 p/r£ = y for some < y < 1. 
We say that the matrix ensemble M obeys condition CI if the random variables are jointly 
independent, have mean zero and variance 1, and obey the moment condition sup^ j E | ^ j- 1 c ' < C 
for some constant C independent of n,p. 

Given such a px n random matrix M, we form the nx n covariance matrix W = W ntP = -M*M. 
This (non- negative) matrix has rank p and the first n — p eigenvalues are 0. We order its remaining 
eigenvalues as 

< Xi(W) < X 2 (W) < ...< Xp(W). 

Denote o~i(M), . . . , o~p(M) to be the sing ular values of M. Notice that a^M) = ^/n\(W) l/2 . 
From the singular value decomposition, there exist orthonormal bases u\ , . . . , u p G C™ and v\ , . . . , v p 
£ C p such that 

Mui = o~iVi 

and 

M*Vi = <JiUi. 

The empirical spectral distribution (ESD) of the matrix TF(which is Hermitian and thus has real 
eigenvalues) is a one-dimensional function 

F w (x)= 1 \{l<j<p:\ j (W)<x}\, 
p 

where we use |I| to denote the cardinality of a set I. 
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The first fundamental result concerning the asymptotic limiting behavior of ESD for large covari- 
ance matrices is the Marchenko — Pastur Law due to [5] (see also pQ). 

Theorem 1.2. (Marchenko -Pastur Law) Assume apxn random matrix M obeys condition CI 
with Co > 4 ; andp, n — > oo such that lim^oo pf 'n = y £ (0, 1], the empirical spectral distribution of 
the matrix W n>p — ^M*M converges in distribution to the Marchenko -Pastur Law with a density 
function 

PMP,y{x) := ^^yVib - x )( x - a )l[a,i>](z), 

where 

a:=(l-^) 2 ,6:=(l + ^) 2 . 

We introduce the notation of frequent events, depending on n, in increasing order of likelihood. 
Definition 1.3 (Frequent events). [13] Let E be an event depending on n. 



• E holds asymptotically almost surely if P(-E') = 1 — o(l). 

• E holds with high probability if P(E) > 1 — 0(n~ c ) for some constant c > (independent 
of n). 

• E holds with overwhelming probability if P(E) > 1 — Oc(n~ c ) for every constant C > 
(or equivalently, that P(E) > 1 — exp(— w(logn))). 

• E holds almost surely if P{E) = 1. 

Definition 1.4 (Matching). We say that two complex random variables C, C' match to order k for 
some integer fc > 1 if one has ERe(C) m Im(C)' = ERe(C') m Im(C') i for a11 m,l>0 with m + I < k. 



Our main result is the following Four Moment theorem, which extends the result (Theorem 6) in 
[TU] to the edge of the spectrum. The proof is analogous to the proofs in [TJ] , [H] and [TU] and 
will be presented in Section [5] 

Theorem 1.5 (Four Moment Theorem). For sufficiently small Cq > and sufficiently large Cq > 
(Cq = 10 4 will suffice) the following holds for every fc > 1. Let M = (Qj)i<i< p .i<j<n and 
M' — (dj)i<i<p.i<j<n be two random matrices satisfying condition CI with the indicated constant 
Co, and assume that for each i,j that £y and match to order 4- Let W,W be the associated 
covariance matrices. Assume also that p/n — > y for some < y < 1. 

Let G : M. k — > K be a smooth function obeying the derivative bounds 

(1.1) \W j G(x)\<n C0 
for allO <j < 5 and x G R k . 

Then for any 1 < i\ < «2 < ■ • ■ < ik < n, and for n sufficiently large depending on fc, Co, we have 

(1.2) |E(G(nA n (W),..., nX lk (W))) - E(G(nA 2l (W), . . . , n\ lk (W')))\ < n~ c °. 



If Qj and Cf^ only match to order 3 rather 4, then the conclusion (1.2) still holds provided that 
one strengthens (1.1) to 

\V 3 G(x)\ < n~ ]Cl 

for all < j < 5 and x £ K fc and any c\ > 0, provided that Co is sufficiently small depending on 
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The next theorem is an extension of Theorem 17 in |10j . which is used in the proof of Theorem 
|1.5| and is of independent interest as well. The proof is delayed to Section [5] 

Definition 1.6 (Gap property up to the edge). Let M be a random matrix obeying condition CI. 
We say M obeys the gap property if for every c > and every 1 < i < p, one has |Ai+i(W)— Aj(W)| > 
n~ x ~ c with high probability. 

Theorem 1.7 (Gap theorem up to the edge). Let M be a random matrix satisfying condition CI. 
Then M obeys the gap property. 

Remark 1.8. When y — 1, the singular value statistics around a = turn out to be different 
since the density function pwp,y{x) has a singularity at x — 0. The hard edge is not really an edge, 
which makes it easier to deal with. In this paper, we will focus on the edge case when a > 0. 

Remark 1.9. We consider n as an asymptotic parameter tending to infinity. We use X <C Y, 

Y > X,Y = Cl(X), orI = 0(Y) to denote the bound X < CY for all sufficiently large n and 
for some constant C. Notations such as X <Cfc Y. X = Ok(Y) mean that the hidden constant 
C depend on another constant k. X = o(Y) or Y — ui(X) means that X/Y — > as n — > oo; 
the rate of decay here will be allowed to depend on other parameters. We write X = 0(F) for 

Y <C X <C Y . We view vectors x £ C" as column vectors. The Euclidean norm of a vector x £ C n 
is defined as := (x*x) 1 ^ 2 . 



This paper is organized as follows: in Section 2, we prove a variant of universality result regarding 
the smallest singular value as an application of the Four Moment theorem. In Section 3, we 
mention a few basic results from linear algebra and probability. In Section 4, we provide the proofs 
of two technical lemmas, which are the major content of this paper. Finally, in Section 5, we give 
the proofs of the Gap theorem (Theorem |1.7[) and Four Moment theorem (Theorem 1.5). The 



argument draws heavily from those in 
to complete the proofs. 



and QJj] , thus we only focus on the changes needed 



Acknowledgments: The author would like to thank Van H. Vu for useful discussion and his 
guidance through to the completion of this paper. 



2. Applications 



In a similar way as [10] (Section 1.3), equipped with the Four Moment theorem, we can obtain 
universality results for large classes of random matrices. Let us demonstrate through some ex- 
amples, focusing on the results for the lower edge of the spectrum. Recall <7i(M p .„) denotes the 
smallest singular value of M p>n . 

We adapt the notation in [11] . In this section, M P: „(£) denotes the random nx p matrix whose 
entries are iid copies of a (real or complex-valued) random variable £. We say £ is M.-normalized (C- 
normalized) if £ is real-valued with E£ = and E£ 2 = 1 (complex-valued with EC = 0, ERe(£) 2 = 
Elm(C) 2 = 1/2, and ERc(£)Im(£) = 0). A complex random variable £ of mean and variance 1 is 
Gaussian divisible if it has the same distribution as (1 — t) 1 / 2 ^' +t 1 / 2 (" for some < t < 1, where 
C" are independent with mean and variance one, with £" complex Gaussian. 

For the case when p = n, the limiting distribution for Gaussian models was computed by Edelman 
[3]. Recently a universality result has been established by Tao and Vu [IT] for the entries with 
bounded sufficiently high moments. 
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For the non-square case, where p < n, the distribution of the smallest singular value of R- 
normalized (C- normalized) entries has been studied by Borodin and Forrester [3J. Later, Feldhcim 
and Sodin [8 proved a universality result for certain sample covariance matrices: 



Theorem 2.1. Let M n 



(Cij) 



ij ) !<?<p,l 



<j<n be a random covariance matrix, where p = p(n) < n 



tends to infinity as n — > oo and limsup ^ p/n < 1. Let Qj be independent for all i,j. 



If Qj are ^-normalized, exponential decaying and symmetric (that is, Cij and — £ij have the same 
distribution), then 

(21) nW-tf/t-nW) 

K ' (pV2 _ n l/2)( p -l/2 ^ n -l/2)l/3 ~> VVl > 

If Cij are C -normalized, exponential decaying and symmetric, then 



(2.2) 



a\{My — (p 1 ' 2 — n 1 ' 2 ) 

(pl/2 _ n l/2)(p-l/2 _ n -l/2)l/3 



where TW\, TW% denote the Tracy-Widom distributions. 



In a same way as the authors proving ([13]. Theorem 9) and ([TO], Theorem 11), one can get the 
following (also see Figure 1 for numerical simulations): 



Theorem 2.2. The conclusions of Theorem 2.1 can be extended to the case when p = p(n) < n 
tends to infinity as n oo and lmi n _ > . co p/ji = y G (0,1], and when M p>n = (Cij)i<i<p,l<j<n 
obeying condition CI with sufficiently large constant Co, and Qj have vanishing third moment. 



Estimated Probability Density Function 





Figure 1. Plotted above are t he e mpirical PDF and CDF of the distribution of 
<^i(Mp,n(C)) 2 (normalized as in (2.1 )) for n — 800, p — 600, based on data from 1000 



random matrices. The blue solid curves were generated with ( = N(0, 1), while the 
red dashed curves were generated with £ a random Bernoulli variable, taking values 
±1 with probability 1/2 each. 

Recently, Ben Arous and Peche proved universality at the edge for random matrices M p _ n (£) with 
i.i.d. entries of Gaussian divisible distribution. And with the matching theorem (Corollary 30, 
|13j), we can drop the third moment condition whereas £ is assumed to be supported on at least 
three points. 
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Theorem 2.3. The conclusion (2.2) of Theorem 2.1 can be extended to the case whenp = p(n) < n 



tends to infinity as n oo and lim„_ i . co p/n = j e (0,1], and when M Pi „ = (£ij)i<i<p,i<j<n 
obeying condition CI with sufficiently large constant Cq, and Qj a-re supported on at least three 
points. 



3. General Tools 

In this section, we collect some basic tools from linear algebra and probability that will be used 
repeatedly in the sequel. 



3.1. Tools from linear algebra. We start with the Cauchy interlacing law and the Weyl in- 
equalities. 

Lemma 3.1 (Cauchy interlacing law). \W\ Let 1 < p < n. 

(i) // A n is an n x n Hermitian matrix, and A n _i is an (n — 1) x (n — 1) minor, then 
Xi(A n ) < Xi(A n -i) < X i+ i(A n ) for all 1 < i < n. 

(ii) If Mp^ n is apxn matrix, and Af p _i in is an (p—l)xn minor, then o~i(M p ^ n ) < <7i(M p _i !n ) < 
a i+ i(Mp t n) for all 1 < i < p. 

(iii) If p < n, if Mp^ n is a p x n matrix, and M Pj „_i is a p x (n— 1) minor, then (Ji-i{M Ptn ) < 
o~i{M p n _i) < (i t (M Pj „) for all 1 < i < p, with the understanding that o- (M p n ) = 0. (For 
p = n, one can of course use the transpose of (ii) instead.) 

Lemma 3.2 (Weyl inequality). [10 j Let 1 < p < n. 

• If A, B are nx n Hermitian matrices, then \\\i(A) — Xi(B)\ < ||A— B\\ op for all 1 < i < n. 

• If M, N are p x n matrices, then \\ai(M) — (?i(N)\ < \\M — N\\ op for all 1 < i < p. 



The following formula for an entry of a singular vector, in terms of the singular values and singular 
vectors of a minor, is very useful: 

Lemma 3.3 (Corollary 25, [HII). Let p,n > 1, and let 

M p , n = ( M P: „_! X ) 



be a p x n matrix for some I £ C, and let I j be a right unit singular vector of M p n with 

singular value <7i(M p ^ n ), where x £ C and u G C n . Suppose that none of the singular values of 
M Pj „_i are equal to CTi(M Pi „). Then 

1 



\x\ 2 



1 ^mm(p,n-l) g,(M p .„_i) 2 U, .( M ,\* Y\^'' 

+ ^J=l (CTj(Mp, n _ 1 ) 2 - CT< (Mp, n ) 2 ) 2 \ v 3\ m P,n-l) A \ 



where Vi(M p n _i), . . . , f m i„( Pi „_i) (M p „_ 1 ) G C p is an orthonormal system of left singular vectors 
corresponding to the non-trivial singular values of ' M pn _i. 



In a similar vein, if 

Y* 



M Pt „ 



6 KE WANG 

for some Y G C™ , and ^ V ^ is a left unit singular vector of -M p>n wit/i singular value o~i(M PjTl ), 
where y G C and v G C p ~ , and none of the singular values of M p _i n are equal to o~i(M P;n ), then 
M 2 = 

11 , , ^mm(p-l,n) gj(M p _i,„) 2 I , M \*v|2' 

1 "•" 2^j=l (o- i (M p _i, ra )2-o- i (M p ,„) 3 ) 2 I w 3 \ m p-\,n) I | 

where ■Ui(M p _ 1 „), . . . , u m j n ( p -i,n)(-^p-i,n) € C™ is an orthonormal system of right singular vectors 
corresponding to the non-trivial singular values of M v -\. n . 

The next lemma is the well-known Cauchy interlacing identities: 
Lemma 3.4 (Lemma 40, |13j). Let A n be a n x n Hermitian matrix, and 

\ A 0>nn 

Let Ai(^4„), 1 < i < n be the eigenvalues of A n and Xj(A n ^i), 1 < j < n — 1 be the eigenvalues of 
A n -i. Suppose that X is not orthogonal to any of the unit eigenvectors Uj(A n ^i) of A n _\. Then 
we have 

y feM^ _ a _ X . (A) 

for every 1 < i < n. 

From this lemma, one immediately gets an interlacing identity for singular values: 



Lemma 3.5 (Interlacing identity for singular values). Assume the notations in Lemma 3.3 then 
for every i, 



ST ^■(M p , n -i) 2 |i; J -(M p , n _ 1 )*X| 2 2 a 



Similarly, we have 

min(p—l,n) v0 . ,^,,0 

r o y ' t7 j (M p _ 1 , n ) 2 | Mj -(M p _ 1 , n )*y| 2 _ a 2 



Proof. Apply Lemma 3.4 to the matrix 



with eigenvalue Oi{M pM ) 2 



W'P." ^ X*M p>n _! IIXH 2 



Since we have Xj(M* _ 1 M p>n _i) = Oj(M PjIl _i) 2 and 

( M * „_ i Af p>n _ x ) * Af * „_ ! = o- i (M Pin _ 1 )« i (Af l ,, n _i)* ) 



(3.11 follows. Similarly, to show (3.2 1, apply Lemma 3.5 to the matrix 



M p , n M* rl 



r*M;_ lin ||y 



2 



□ 
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The Stieltjes transform s n {z) of a Hermitian matrix W is defined for z e C by the formula 

1 " f 

Sn{z) := ^MfH' 

By Schur's complement, it has the following alternate representation: 

Lemma 3.6 (Lemma 39, |13|). Let W = (Cij)i<i,j<n be a Hermitian matrix, and let z be a complex 
number not in the spectrum of W . Then we have 

1 - 1 

Sn{Z) = n £ Ckk- Z -a* k (W k -zI)-ia k 

where W k is the (n— 1) X (n— 1) matrix with the kth row and column ofW removed, and a k G C" _1 
is t/ie k th column of W with the kth entry removed. 



3.2. Tools from probability theory. We will rely frequently on the next concentration of mea- 
sure result for projections of random vectors. 

Lemma 3.7 (Lemma 43.[13jL Let X = . . . , £„) € C™ be a random vector whose entries are 
independent with mean zero, variance 1, and are bounded in magnitude by K almost surely for 
some K, where K > 10(E|£| 4 + 1). Let H be a subspace of dimension d and tth the orthogonal 
projection onto H. Then 

P(| || n H (X) || -Vd\ >t)< lOexpt-^). 

In particular, one has 

|| tt h (X) \\= Vd + 0(K log n) 

with overwhelming probability. 

Lemma 3.8 (Theorem 44, |13j). Let 1 < N < n be integers, and let A = (aij)i<i<N-,i<j<n be 
an N x n complex matrix whose N rows are orthonormal in C™ , and obeying the incompressibility 
condition 

(3.3) sup |ajj-| < a 

l<i<N;l<j<n 

for some a > 0. Let £i, . . . , £„ be independent complex random variables with mean zero, variance 
one, and E | Ci 1 3 ^ C for some C > 1. For each 1 < i < N, let Si be the complex random variable 

n 

Si :— QijCj 

3=1 

and let S be the C N -valued random variable with coefficients Si, ... , Sn ■ 

• (Upper tail bound on Si) For t > 1, we have "P(\Si\ > t) <C exp(— ct 2 ) + Co for some 
absolute constant c > 0. 

• (Lower tail bound on S) For any t < VN , one has P(|5| < t) < 0(t/VN)L N /^ + 
CNH~ 3 a. 



The same claim holds if one of the Q is assumed to have variance c instead of 1 for some absolute 
constant c > 0. 
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4. Main technical lemmas 



Recall in the proofs of the Four Moment Theorem and the Gap Theorem as in [TU], [T3] and 
|12j . a crucial input was the Delocalization Theorem of Erdos, Schlein, and Yau ([7], [6J and 
[5]). The material in this section is analogous to Section 3 of [TU]. We will first extend the 
concentration of ESD result to the edge of the spectrum and use this concentration theorem to 
show the delocalization of singular vectors. The proof of the delocalization result in the edge of 
spectrum is significantly different from that in the bulk of spectrum as in [TUJ . However, similar to 
|12j . the Cauchy interlacing identities for singular values in Theorem 3.5 will help us to deal with 
this problem. 



First observe that if M — (dj) obeys condition CI for some constant Cq > 0, then by Markov's 
inequality and the union bound, one has < n w / c ° for all i,j with probability 1 — (3(n~ 8 ). By 
a truncation technique (see [5] for details) and Lemma 3.2 one may assume that 

almost surely for all 



We will derive the eigenvalue concentration theorem (up to the edge) which is an analogue of 
Theorem 19 in [TU]: 

Theorem 4.1 (Concentration up to the edge). Suppose thatp/n — > y for some < y < 1. Assume 
a > 0. Let M = (Cij)i<j<p,i<j<n obey condition CI for some Cq > 2 and the probability distribu- 
tion of be continuous. Assume further < K almost surely for some K — o(n 1 / 2 (5 2 log _1 n) 
for all i,j, where < 5 < 1/2 (which can depend on n). Then for any interval I C [a, b] of 

length \I\ > vf- — one has with overwhelming probability (uniformly in I) that the number of 

eigenvalues Ni of W in I obeys the concentration estimate 

Wi-P J p MPty (x)dx\<Sp\I\. 



As a consequence of Theorem |4.1| one can deduce the following delocalization theorem: 

Theorem 4.2 (Delocalization of singular vectors up to edge). Let the hypothesis be as in Theorem 
\4-l\ then with overwhelming probability, all the unit left and right singular vectors of M have all 
coefficients uniformly of size at most A" 2 n -1 / 2 log ' 1 '* n. 

Remark 4.3. The continuity hypothesis in the above theorems, which guarantees the singular 
values are almost surely simple, is only a technical one. In practice we are able to eliminate this 
hypothesis by a limiting argument using Lemma \3l^ 



4.1. Proof of Theorem 4.2: Let ui(M p , n ), . . . ,u p {M P:n ) 6 C n be the right singular vectors of 
M Pt n. By the union bound and symmetry, it suffices to show that 

MM p , n )*ei| < K^/Hog^n 
with overwhelming probability. The delocalization of left singular vectors can be proved similarly. 



The "bulk" case is treated in [TOj. Now we consider the edge case when 1 < i < O.OOln or 
0.999n < i < n (say). Using the Marchenko-Pastur law, we have with overwhelming probability 
that 

\Xi(W n , p ) - a\ < o(l) or \Xi(W n , p ) - b\ < o(l). 
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By Lemma |3.3| it suffices to show with overwhelming probability that 

min(p,n-l) . , 2 

From Lemma 



3.7 



we conclude that \vj[M p ^ n -\f X\ <C K\ogn with overwhelming probability for 
each j (and hence for all j, by the union bound). Then it is enough to show that with overwhelming 
probability 

mm(p,n-l) 2 

^ (cr_y(M Pi „_i) 2 — CTj(Mj, jn ) 2 ) 2 ~ > n-A log 

By the Cauchy-Schwarz inequality, it thus suffices to show that 

with overwhelming probability for some 1 < T_ < T+ < X 2 log° (1) n. Noticed that a j {M v ^ 1 ) 2 = 
\j(M* _\M Ptn _x) = 6(n), we thus need to show 

with overwhelming probability for some 1 < T_ < T + <C K 2 log ^ 1 -* n, which is equivalent to prove 
that 

U 2 ) V i ^(^ P ,n-i) 2 l«j(M p , n -i)*X| 2 

with overwhelming probability for some 1 < T_ < T + <C A' 2 log ^ 1 ' n. 
In the interlacing identity in Lemma |3.5[ we have 

(4 31 ^AM v ^?\v 3 {M P ^YX\ 2 l., Y||2 AW , 



^ n aj(M Ptn -i) 2 - cj t (M Ptn ) 



By Lemma 3.7 one gets i||AT|| 2 = l+o(l) with overwhelming probability. And since p/n = y+o(l), 
one has 

(4 4) ^-'h ajjM^MM^rXl 2 



with overwhelming probability. In order to show (4.2), we will evaluate 

1 f T J -(M P:K _ 1 ) 2 | t ; J -(Af p ,»_ 1 )*X| 2 
^ n aAMvn-x) 2 - adM vn ) 2 

j>i+T+ or j<i-T- n p ' ' n p ' ' 

\- 1 |i; 3 -(Mp, n _i)* J y| 2 A J -(W p , n -i) 

^ n A 7 (Wp n _i) - Aj(W„ „) 

for some T_,T+ = X 2 log° (1) n, where VF p ,„_i = ^M^^Mp,^!. 
The Machenko-Pastur law implies Aj(W P)n _i) = 6(1) for every 1 < j < min(p, n — 1) 
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Let A > 100 be a large constant to be chosen later. From Theorem 4.1 we have that (by taking 
S = \og- A ' 20 n) 

(4.6) Nj = pai \I\ + 0{\I\p\og- A,2a n) 

with overwhelming probability for any interval / of len gth \I\ — K 2 log" 4 n/n, where aj := 
jjy J j PMP,y{x) dx. For such an interval, we see from Lemma 3.7 that with overwhelming probability 

£ — ' n n n 

j:\ j (W p ,. n -i)el 



and thus by (4.6) (for A large enough), 

£ hvjiM^rXf = yaj\I\+O(\I\\og- A / 20 n). 



n 

j:\ j (w p _ n - 1 )ei 



Set di := dist(A 'j^'" )J) . If di > log n (say), then 



A 3 -(W p , n -i) = Xi(W p , n ) Q MW p , n ) 

XjiWp^) - Xi{W p , n ) + di\I\ [ d 2 \I\ ' 



for all j in the above sum, and since Aj(W Pj?1 ) = 6(1), we get 

2 



v 1 | Uj (M p , n - 1 )*X| 2 A J (W p , n - 1 ) / W, w ) \ q f 



+ Q(|/|log-^ 20 n)+O( 1Og " A/20 " ). 



We now partition the real line into intervals / of length K 2 \og A n/n, and sum ( 4.7 ) over all intervals 
/ with di > logn. Bounding ai crudely by 0(1), we see that ^jO(^s-) = 0{^-^) = o(l). 
Similarly, one has 

£0(|I|log-^ 2 %) = O(\og- A / 20 n) = o(l) 
i 

and 

E0( l0S T° n ) - O(log-^ 20 nlogn) = o(l). 
Finally, Riemann integration of the principal value integral 

p.v. t PK \ P ^ X) , dx := hm / PA [ P f {X) dx 



a X — Xi(W Pt n) Ja<x<b:\x-\,(W P:n )\>£ x ~ X l (W p . n ) 

shows that 

in/' 1J K(W Ptn )\ f b xp M p,y{x) 

? yaA11 I 1 + ^WT) = P ' V - X y ,-A 4 (^,„) dX + 0(1) ' 



If |Aj(Wp !n ) — a\ < o(l), using the formula for the Stieltjes transform, one obtains from residue 
calculus that 

f b Xp M P,y(x) ( njr . f b PMP.y(x) , \ 

p - v 7 a ^-w,») d * - H P ( " n) ^^fc) J 

= y (l + (l-^) 2 - 7 =^) + o(l) 
= v^+oC 1 ) 
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thus 
(4.8) 



E 



l g J (M p , n _ 1 ) 2 |i; J -(M p , n - 1 )*X| 2 = ^ | p(1) 



j>i+T + or i<i-T_ U ^(Mp.n-l) 2 ~ <?i(M p , n ) 



and in (4.4 1, 



(4.9) 



1 trj M p , n _i 2 u/ (M p ,„_i )*X 2 

- — — ^-V- = 2/ + o(l) - Ai(Wp^) =2^-1 + 0(1) 



E 



^ n CTj(M Pin _i) 2 - cTj(M P)n ) 2 



When < y < 1, y/y > 2^/y - 1. (4.2) follows by comparing (4.8 1 and (4.9 1 



If |Ai(Wp,„) - b \ < we have 

f b Xp M P,y(x) 

p. v. / y — . ; dx 



a x-Xi(W Ptn ) 



y 1 + p.v.A i (W / Pi „) 



PMP,y(^) 

„ a;-Ai(W p ,„) 



da; 



2/ 1 - (1 + Vtf)" 



1 



Vv + y 



thus 
(4.10) 



E 



i>i+T + or j<i-T. 



lg J -(M p , n -i) 2 |i; 3 -(Mp, n -i)* J y| 2 



o(l) 



and in (4.4 1, 

min(p,n— 1) 



( 4 -ii) E 



la J -(M p , n _ 1 ) 2 |t; 3 -(M p , n _ 1 )*X| i 
n cr 3 (M Pi „_i) 2 - ai(M Ptn ) 2 



y + o(l) - A 4 (1T P ,„) = -2^ - 1 + (1) 



When < y < 1, — 0/ > —2^/y— 1. Then (4.2) follows by comparing (4.10) and (4.11). 



By the concentration theorem |4.1| and the Cauchy interlacing law, the interval I with dj < logn 
will contribute at most K 2 log 6 ^ 1 * 1 n eigenvalues and we can set T_ , T + accordingly. The proof is 
now complete. 



4.2. Proof of Theorem |4.1[ We first have a crude upper bound on the number of eigenvalues of 
W on an interval. The proof can be found in Section 5.2, [TU] . 



Proposition 4.4. (Upper bound on ESD) Assume the hypotheses in Theorem 4-1 then for any 
interval JcR with length \I\ > Klog n/n, one has 

Ni < n\I\ 

with overwhelming probability, where Nj is the number of eigenvalues in the interval I . 



The strategy is to compare the Stieltjes transform of the ESD of matrix W 



with the Stieltjes transform of Marchenko-Pastur Law 



SMP, y (z) := 



1 



1i 



-PUP,y(x) dx 



a 2irxy(x-z) 



y/(b — x)(x — a) dx. 
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And thanks to the next proposition, one gets control on ESD through control on the Stieltjes 
transforms. 

Proposition 4.5. (Lemma 29, [TO],) Let 1/10 > rj > 1/n, and a,b,e,5 > 0. Suppose that one has 
the bound 

\s(z) - s MP , v (z)\ < S 

with (uniformly) overwhelming probability for all z with a < Re(z) < b and Im(z) > r\. Then for 
any interval L in [a — e, b + e] with \I\ > max(2rj, Q log 4), one has 



i 



\Nj — n j Psc(x) dx\ < Sn\I\ 

with overwhelming probability. 



By Proposition 4.5 our objective is to show 
(4.12) \s(z)-s MP Jz)\ =o(6) 

with (uniformly) overwhelming probability for all z with a < Re(z) < b and Im(z) > rj := — " . 
Since Smp, v {z) is the unique solution to the equation 

SMP.y(z) H ; — 1—- = 

y + z-l + yzsMp, y {z) 
in the upper half plane (see 0), we investigate a similar equation for s(z). 

From Lemma |3.6| we have 

(4 ' 13) ■M-Ssr^n 

where Y k = a* k (W k — zl)~ 1 a k: and W k is the matrix W* = ^MM* with the fc th row and column 
removed, and a k is the k th row of W with the fc th element removed. Let M k be the (p — 1) X n 
minor of M with the fc th row removed and X* € C" (1 < i < p) be the rows of M. Thus 
Cfcfc = X k *X k /n = \\X k \\ 2 /n,a k = iAf fc A% W fc = ±M k M* k . And 

' Ofc«j(M fc )| a _ 1 Xj(W k )\Xl Uj (M k )\ 2 



lk 2^ x ./"if. ^ _ 7 ~ 



^Aj(W fc )- 2 A,(^ fc )-z 

where iti(Mfc), . . . ,u p -i(Mk) € C" and Wi(Mfc), . . . , v p -i(M k ) € C p_1 are orthonormal right 
and left singular vectors of M k . Here we used the facts that a* k Vj(M k ) — -X k M k Vj(M k ) = 
la 3 {M k )X* k u 3 {M k ) and a 3 {M k ) 2 = n\j(W k ). 

The entries of X k are independent of each other and of W k , and have mean and variance 1. 
Noticed Uj(M k ) is a unit vector. By linearity of expectation we have 

f-[ nAj(Wft) -z n n ^-J A^Wfc) - z n 

where 

1 1 
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is the Stieltjes transform for the ESD of ■ From the Cauchy interlacing law, we can get 
\s(z) - (1 - -)s k (z)\ = 0{\ { dx) = O(-) 

P P JR \ x ~ z \ PV 

and thus 

E(Y k \W k ) = P — + z*s{z) + 0(— ) - P — + z^s{z) + o(S 2 ). 
n n nrj n n 

In fact a similar estimate holds for Y). itself: 

Proposition 4.6. For 1 < k < n, Yf. = E(Yfc|Wfc) + o(S 2 ) holds with (uniformly) overwhelming 
probability for all z with a < Re(z) < b and Im(z) > r\. 



Proof. Decompose 



71 



Let T C {1, ... ,n — 1}. Let be the space spanned by {uj(Wk)} for j G T and P# be the 
orthogonal projection onto H. Thus Rj = \ \Pjj (Xk)\ | 2 — dim(iJ). 

By Lemma |3.7| we conclude with overwhelming probability 



(4.14) \J2 R j\^ VW\K\ogn + K 2 \og 2 n 

Using the triangle inequality, 



(4.15) ]T|i?,|«|T|+^ 2 log 



2 

n. 



2 1„„6 , 



Let z = x + \/—Vq, where r\ = — ? |°-f " and a < x < b. We will use two auxiliary parameters 
a = 5 2 log -1 ' 1 n,8' = S 2 log -0 ' 1 n in later estimation. 



First, for those j G T such that |Aj(Wfc) — x\ < 5'rj, the function ^7^rjj~~7r^ nas magnitude 
0(A). From Proposition 4.4 |T| <C nJ'rj, the contribution for these j € T, 

i 1 E J^^ii « - E « r ^ +K2log2n - ^ 2 )- 

'n^; Aj(W)t) -z n nr lj^j nr l 
For the contribution of the remaining indices, we subdivide them as 

(1 + a) l 8'rj < \Xj(W k ) - x\ < (1 + a)' +1 ^ 
for < I <C logn/a, and then sum over I. 

For each such interval, the function . n ,, X ^ Wk ' > ^ has magnitude 0( M , \ lx , ) and fluctuates by 

' Xj(W k ) — x — *J— \r\ ° ^ {l+ayo'-q' J 



at most 0( (i + ay&>n )- ^ av -^(0 ^ s se ^ °^ au m ^ ms interval, by Proposition 4.4 \T(l)\ <C 
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na(l + a) l 6'rj. Together with bounds (4.14), (4.15 ), the contribution for these j on such an interval, 

' n 



,i v Hw k ) . 

\ n ^ i)Xj (W k )-z ''^a(l + a)W„ 



1 v / a(lTa) I 5Viii' log n + if 2 log 2 ■ 



n 

a a{l + a) 1 S'-q + K 2 log 2 n 
(1 + a) l 5'i] n 



< — if log re - 



— ; h a 

o'r\n 



Summing over ^ (taking into account that I <C logn/a), we will get 

,1^^ A 7 -(W fc ) , if log 2 n if 2 log 3 n , 

-V V . ,l) T s' Rj\< , + -^+alogn = o 5 2 . 

n Xj{W k ) - z J yfaS'rin aS'rjn 



□ 



Recall SMP, y (z) has an explicit expression 



SMP,y(z) 



y + z — 1 — y/(y + z — l) 2 — 4yz 
2^ : 



where we take the branch of y/(y + z — l) 2 — Ayz with cut at [a, b] that is asymptotically y + z — 1 
as z — > oo. 



From (4.13) and Proposition 4.6 we have with overwhelming probability that 

1 



s(z) 



o. 



I + z - 1 + z^s(z) + o{5 2 ) 
where we used Lemma 3.7 to obtain that = | \X k | | 2 /n = l+o(<5 2 ) with overwhelming probability. 



By assumption p/n — > y, when n is large enough, 



(4.16) 



s(z) 



1 



y + z — 1 + yzs(z) + o{5 2 ) 



holds with overwhelming probability. 



In (4.16), for the error term o(<5 2 ), one has either y+z °[ + y ZS ^ = °{6 2 ) or y + z — l + yzs(z) = o(l). 
In the latter case, we get s(z) — — v+ y Z 1 + I n the first case, we impose a Taylor expansion 

on ( [416] ), 

s(z)(y + z - 1 + yzs(z)) + 1 + o(c$ 2 ) = 0. 
Completing a perfect square for s(z) in the above identity, one can solve the equation for s(z), 



(4.17) 



/yz(s(z) + J/ ^- Z — 1 ) = ±\ 



2yz 



' {y + z -if 

Ayz 



l + o(<5 2 ) 



If 



■ / (« + *-!) 
V 4 B * 



o(<5), by a Taylor expansion on the right hand side of (4.17), we have y/yz(s(z) + 



*±*=i) = ±^ (V % 1Y - 1 + o(6). Therefore, s(z) = s M p, y {z) + o(S) or s(z) = s MP , y (z) - 

yj(y+z-iy~4y 



+ o(6) = -smpJz)- + o(5). If teil- 

explicit formula for SMP,y(z), we still have s(z) = smp, v (z) + o(<5). 



1 = o(<S 2 ), from (4.171 and the 
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To summarize the above discussion, one has, with overwhelming probability, either 



(4.18) 
or 

(4.19) 
or 

(4.20) 



s(z) = s MP: y(z) + o(5) 



S(Z) = SMP,y(Z) h 0(d) = -S M P,y{Z) — h 0(6) 



yz 



s(z) 



y + z-l 
yz 



o(l) 



We may assume the above trichotomy holds for all z = x+y—lr) with a < x < b and r/o < r\ < n 10 /6 
where m = ^§^. 

When rj = n 10 /S, from \s(z)\ < 1/rj and \sMP,y(z)\ < 1/?/, we have s(z) and SMP,y(z) are both 
o(d~) and therefore (4.18) holds in this case. By continuity, we conclude that either (4.18) holds in 
the domain of interest or there exists some z in the domain such that ( |4.18 1 and (4.19) or (4.18) 
and (|4.20[) hold together. 



On the other hand, (4.18) or (4.20) cannot hold at the same time. Otherwise, SMP,y(z)- 
o(l). However, from s M p, y (z)(s M p, y (z) + M± § r 1 ) = ~y z and \s M P,y( z )\ < ^(l-^l+y^) 



y+z-l = 
one can 



see that \sMp, y (z) + 



y+z-i i 



is bounded from below, which implies a contradiction. 



Similarly, (4.18) or (4.19) cannot both hold except when (y + z — l) 2 — Ayz = o(S 2 ). Otherwise, 
we can conclude that 2smp. v {z) + v+z ~ 1 = °(<5)- From the explicit formula of 



SMP,y(z) 



y + z-l = y/(y + 
yz 



z — l) 2 — Ayz 

yz 

One can conclude | 2smp, v (z) + v+z ~ \ > C5, which contradicts our assertion. Actually, if (y + z- 



l) 2 - Ayz = o(5 2 ), (4.18) and (4.191 are equivalent. 



In conclusion, (4.18) holds with overwhelming probability in the domain of interest. 



5. Gap theorem and Four Moment theorem 



In this section, we complete the proofs of the main results, Theorem 1.5 and Theorem 1.7 The 
proofs follow closely those in [10] (as well as in [15] . [T2]). so we shall focus on the changes needed 
to that argument. We assume substantial familiarity with the materials in |13j . [12] . and will cite 
from them repeatedly. 



It is convenient to use the augmented matrix 
(5.1) M:= 



M* 
M 

which is a (p + n) x (jp + n) Hermitian matrix with eigenvalues ±<7i(M), . . . , ±tr p (M) and n — p 
zeros. In this way, we can import the results obtained in [13], [TJ] and [TO] to the model discussed 
in this paper. 
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As mentioned in the beginning of Section [5j one can assume that 

\dsM\<n 10/a ° 

almost surely for all We also assume that the distributions of M, M' are continuous to ensure 
the singular values are almost surely simple. 

Let us first state a weaker version the Four Moment Theorem as we assume gap properties for the 
matrices considered: 

Theorem 5.1 (Four Moment theorem with Gap assumption). For sufficiently small cq > and 
sufficiently large Cq > (Cq = 10 4 will suffice) the following holds for every k > 1. Let M = 
(Cij)i<i<p t l<j<n an d M' = (Cij)i<i< p ,i<j<n be two random matrices satisfying condition Cl with 
the indicated constant Cq, and assume that for each i,j that and match to order 4- Let 
W,W' be the associated covariance matrices. Assume also that M,M' obey the gap property and 
p/n — > y for some < y < 1. 

Let G : M. k — > K be a smooth function obeying the derivative bounds 

(5.2) \W j G(x)\<n c ° 
for all < j < 5 and 

Then for any \ < i\ < 12 < ■ ■ ■ < ik < n ) arl >d for n sufficiently large depending on fc, Co, we have 

(5.3) |E(G(nA n (W), . . . , n\ tk (W))) - E(G(n\ Zl {W), n\ tk (W')))\ < n~ c °. 



If Cij an d dj on hj match to order 3 rather 4, then the conclusion (5.3) still holds provided that 



one strengthens (5.2) to 

\S/ 3 G{x)\ < n- ]Cl 

for all < j < 5 and x £ K fc and any c\ > 0, provided that cq is sufficiently small depending on 

Cl. 

The Four Moment theorem follows directly from Theorem |1.7| and Theorem |5.1| The next two 



sections are devoted to the proofs of Theorem 1.7 and Theorem 5.1 



5.1. Proof of Theorem |5.l[ The key technical step (also used in proving Theorem 1.7 1 is the 
truncated Four Moment Theorem, which follows by applying [T2l Proposition 6.1 and Proposition 
6.2] (or see [101 Proposition 35]) to the argumented matrix M. The proof is omitted here. 



Theorem 5.2 (Truncated four moment theorem). For sufficiently small cq > and sufficiently 
large Co > (Cq = 10 4 will suffice) the following holds for every k > 1. Let M — (Cij)i<i<p,i<j<n 
and M' = (dj)i<i<p,i<j<n be two random matrices satisfying condition Cl with the indicated 
constant Cq, and assume that for each i,j that Qj and • match to order 4- Assume also that 
\dj\> \dj\ < n 10 / Co and p/n ->■ y for some < y < 1. 

Let G : M. k x — > K be a smooth function obeying the derivative bounds 
(5.4) \V3G{x 1 ,...,x kl q 1 ,...,q k )\<n c ° 

for all < j < 5 and x±, . . . , € M 7 lit ■ • ■ iQk G ~^+> an d such that G is supported on the region 
9l) • • • j Qk < n c ° , and the gradient V is in all 2k variables. 
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Then for any 1 < i\ < %2 < ■ ■ ■ < ik < n , and for n sufficiently large depending on e,k,CQ, we 
have 



(5.5) 



\E(G(Vna n (M), . . . , yfar ih (M), Q H (M), ...,Q ik (M))) 

- E(G(^ (1 (M'), . . . , ^ fc (M'),Q 2l (M'), Qi»(M')))| < n~ c °. 



^/ Cij an ^ Cj on/y match to order 3 rather 4, then the conclusion (5.5) still holds provided that 
one strengthens (5.4) to 

\VG(x 1 ,... 7 x k ,q 1 ,... 7 q k )\ <n-'' Cl 
for any c\ > 0, provided that cq is sufficiently small depending on c±. 



As in the arguments in Section 6 in |10) , we use the qualities for 1 < i < p, 
Ql(M) " S |^(A-l t (M))p 



1 , 1 « ■ — P sr^ 



1 



n 

i<j< P 



The gap property (up to the edge) on M ensures an upper bound on Qj(M). The proof repeats 
exactly the proof of Lemma 32 in [TU] . 

Proposition 5.3. If M satisfies the gap property, then for any c > (independent of n), and any 
1 < i < p, one has Qj(M) < n c ° with high probability. 



Now Theorem 



5.1 



follows by defining G 



to be 



G'(v / ncr il , . . . , Vna lk , Q h , . . . , Q ik ) := G(y/na il Vna lk ) J| n(Q ij ) 

2=1 

where 77(2) is a smooth cutoff to the region x < n c ° which equals 1 on x < n c ° /2. From Propositon 
|5.3| we have 

|E(G(v/na n (M), . . . , ^a ik {M)))-V{G{^na tl (M), . . . , V^ fe (M), Cfo (M), . . . , Q lk (M)))| « -I 

n c 

for some c > 0, and a similar relation holds for M'. The proof is complete by using the above 
relations and Theorem 15.21 



5.2. Proof of the Gap Theorem. We first have a gap theorem under additional exponential 
decay hypothesis on the ensembles of M. The proof is presented in Section [573] 

Theorem 5.4 (Gap theorem up to the edge). Let M — (Cij)i<i<p,i<j<n be a random matrix 
obeying condition CI, and the entries Qj satisfy exponential decay in the sense that P(\Qj\ > 
t c ) < exp(-t) for all t > C' for all i,j and some constants C,C' > 0. Then M obeys the gap 
property. 



The next observation is the following matching lemma (See Lemma 33 in |10l ). which together 
with Theorem |5.2[ ensures us to remove the exponential decay hypothesis in Theorem |5.4| 
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Lemma 5.5 (Matching lemma, 10J). Let £ be a complex random variable with mean zero, unit 
variance, and third moment bounded by some constant a. Then there exists a complex random 
variable £ with support bounded by the ball of radius O a (l) centered at the origin (and in particular, 
obeying the exponential decay hypothesis uniformly in C, for fixed a) which matches £ to third order. 



in Theorem 1.7 



Now consider the matrix M = (Cij)i<i<p.i<j<n 
can find a random matrix M' = {Cij)i<i<p,i<j<n such that Q~ 
hypothesis and Cy matches £ij to third order for each By Theorem 
the gap property. Similarly as in Section 6, |10j . let rj{x) be a smooth cutoff to the region x < n 
~ F,t](Qi(M')) = 1 - 0(n 



By the matching lemma, we 
satisfies the exponential decay 
the matrix M' 



5.4 



obeys 



Then by Proposition 
E» 7 (Q i (M / )) = l-0(n 



5.3 



L ), which, by using Theorem 5.2 implies that 



! ) for some c 2 independent of n. Hence, M also satisfies the gap property. 



5.3. Proof of Theorem 5.4: The proof follows closely to that discussed in Section 5, [T2]. We 
shall mainly mention the corresponding changes. Interested readers can find the detailed proofs 
in [13] . First, in order to operate an induction argument, we need to treat the edge case i = l,p 
separately. 



Proposition 5.6 (Extreme cases). The 



5.4 is true when i — 1 or i = p. 



Proof. By symmetry, it suffices to show for i — p. In the interlacing identity (Lemma 3.5), 



p-i 1 

2=1 



a J (M p _ 1 ,„) 2 |u J (Af p _ 1 ,»)*y| s 
a J (M p _ 1 ,„) 2 -(T p (M P! „) 2 



\Y\ 



1 



o~p(M p>n ) . 



On the right-hand side of the above identity, Y E C™, by Lemma 3.7 ||F|| 2 /n = 1 + o(l) with 
overwhelming probability. And c r p {M p _ n ) 2 jn = X p (W) = b + o(l) with overwhelming probability. 
All the terms in the left-hand side is are negative signs, thus 



^Oj-(M p _i, n ) 2 |Mj(Mp-i, n )*y| s 



< 1. 



From Theorem 



4.2 



and Lemma 



3.8 



probability. Therefore, \aj{M p -x,n) 
follows from the Cauchy interlacing law 



one can conclude that |w p _i(M p _i !n )*y| 2 > n" c / 10 with high 
J — o-p(M P:n ) 2 \ > n~ c with high probability. The conclusion 



□ 



For the general case for the gap theorem, we write io,Po instead of i,p and define Nq := po + n, 
as in |13j . we introduce the regularized gap 



(5.6) 



9i,l,p '■— m ^l<i_<i-l<.i< 



<i+<P~ 



/No<T i+ (M Pjn ) - VNyTi_ (M p , n ) 
min(i+ - i_ , log Cl iV ) lo s ' 9 N ° ' 



where C\ > 1 is a large constant to be determined later. To show Theorem 5.4 it is enough to 
show that 



9io,l,Po 



< n~ C0 . 



for 1 < io < pq. 



By repeating the arguments in Section 3.5, [13) . the proof relies on the following two key proposi- 
tions. The idea is to propagate a narrow gap for M p n backwards in p until one can use Theorem 
|4. 1| to control the occurrence of the gap. 
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Proposition 5.7 (Backwards propagation of gap). . Suppose Po/2 < p < Po an d I < p/10 is such 
that 

(5-7) 9i ,i, P +i < $ 

for some 1 < 5 < 1 (which can depend on n), and that 

(5.8) y/na p+1 (Mp +ltn ) - Vno- p (M p+liTl ) > Jexp(log°' 91 n ) 

Then io < P- Suppose further that 

9i ,l+i,p > 2" l gi ,i,p+i 

for some m > with 

2 m < (j-i/2 

Le£ X p _|_x ^ e ^ ne P + l'' 1 row °f M po n) and let Ui(M pn ), . . . , u p (M pn ) be an orthonormal system 
of right singular vectors of M PtTl associated to o~i(M p , n ), . . . , o~ p (M Ptn ). Then one of the following 
statement holds: 

(i) (Macroscopic spectral concentration) There exists l<i-<i + <p+l with i + — i- > 
\og Cl/2 n such that \Vna i+ (M p+1<n ) - ^nor z _(M p+1 . n )\ < S 1 / 4 exp(log a95 )(i + - 

(ii) (Small inner products) There exists 1 < i- < io — I < io < i+ < p with i + —i- < \og Cl ^ 2 n 
such that 



Yl \ x *p+i u j( m p,-> 



, 2< •+ 



2 m /2log°- 01 „' 

(iii) (Large singular value) For some 1 < i < p + 1 one has 

UfM \\ > V^exp(-log°- 96 n) 

(iv) (Large inner product) There exists 1 < i < p such that 

(v) (Large row) We have 

2 nexp(-log 096 ri) 
ll-Ap+lll > • 

(vi) (Large inner product near io) There exists 1 < i < p with \i — Iq\ < log Cl n such that 

\X* p+lUl {M p . n )\ 2 >2™/ 2 log as n. 



Proof. Apply Lemma 5.3 in |12j to the augmented matrix 

A p + n +i '■= \pn 

and 



M p +i t n 



r M p *„ 

Noticed A p+n is -i4p +n _|_i with the rightmost column and bottom column(which is X p+ i and p+1 
zeros) removed. The eigenvalues of A p+n are ±\/no~i(M PtV ), . . . , ±^/na p (M p , n ) and 0, and an 

orthonormal eigenbasis includes the vectors [ U (^f/ \ ) f° r 1 < J < P- (The "Large coefficient" 

event in Lemma 5.3, [12] 'l cannot occur as A p+n+ i has zero diagonals.) □ 
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The next proposition claims that the events (i)-(vi) occurs with small probability. 

Proposition 5.8 (Bad events are rare). . Suppose that po/2 < p < po and I < p/10 and set 
5 := Uq K for some sufficiently small fixed n > 0. Then 



(a) The events (i), (in), (iv), (v) in Proposition 5.1 all fail with high probability. 

(b) There is a constant C such that all the coefficients of the right singular vectors Uj(M p n ) 
for I < j < p are of magnitude at most n -1 / 2 log n with overwhelming probability. 
Conditioning M p .„ to be a matrix with this property, the events (ii) and (vi) occur with a 
conditional probability of at most 2~ Km + n~ K . 

(c) Furthermore, there is a constant C'2 (depending on C , n, C\) such that if I > Ci and M Ptn 
is conditioned as in (b), then (ii) and (vi) in fact occur with a conditional probability of at 
most 2- Km log~ 2Cl n + n~ K . 

The proof of the above proposition repeats the proof of Proposition 53 in 13J with the major differ- 
ence being that Theorem |4.1| and Theorem |4.2| are applied instead of Theorem 60 and Proposition 
62 in [ED. 
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